Streamlined Toolkit for Real-time Exploratory Analysis of Multiomics

0.10.50.9
0.10.50.9
507498
0.10.50.9
Signature
Loading...

Methods — STREAM v2.0 dashboard build

Marker genes

We compute per-cluster differential expression with Scanpy’s tl.rank_genes_groups using three statistics (wilcoxon, t-test, logreg). For each method we contrast the selected leiden cluster against the rest, keep genes with adjusted p < 0.05 (Bonferroni), and display the top hits by the method’s score (up to 50 per cluster). For readability, known aliases are appended to symbols in the table (display only).

Neighbor-aware “Top markers” (new)

To make marker calls more specific to local micro-populations, we compare each cluster against its K most-connected neighboring clusters on the KNN graph (default K = 3). We build a cluster-by-cluster connectivity matrix by summing cell-level graph weights (obsp['connectivities']), pick the top-K neighbors per cluster, and run rank_genes_groups (wilcoxon) with the target cluster as the group and the pooled neighbors as the reference. Results are filtered at adj. p ≤ 0.05 and ranked by |logFC|; we keep the top 50 per cluster. If a neighbor graph is missing, we compute PCA and pp.neighbors on the fly, or gracefully fall back to cluster-vs-rest if too few clusters exist.

Cell-type prediction (enrichment)

We score cell-type signatures with decoupler’s ULM method against a curated PanglaoDB marker resource (human entries, canonical markers only, sensitivity > 0.5, duplicates removed). ULM scores are computed per cell (dc.mt.ulm, tmin=3), then summarized per cluster using rankby_group (t-test with overestimated variance); only positive-stat entries are kept. The “Predicted annotation” tab shows (i) a UMAP colored by per-cell enrichment for the selected cell type and (ii) per-cluster score distributions. We list the top 5 candidate cell types per cluster as suggestions, not hard labels.

Transcription-factor programs

TF activity is estimated with decoupler ULM using the human CollecTRI regulon network. We compute per-cell TF scores, then rank TFs per cluster (same statistics as above) and retain the top 5 TFs as “program markers.” The TF tab shows a UMAP of the selected TF’s activity and a violin panel with cluster-wise distributions.

Co-expression & gene-set analysis

Co-expression: given Gene A and Gene B, cells are colored either (a) in bivariate mode using independent quantile thresholds for A and B (A-high, B-high, both-high, low) or (b) in ratio mode using A/B (or log2(A/B)). Controls include per-gene clipping to upper quantiles, binarization, a global scale option, and an Otsu-based auto-threshold “Suggest” button for each gene. An auxiliary table lists cells with the most extreme A/B ratios.

Gene-set scoring: the “Signature” panel computes a per-cell mean Z-score across the pasted gene list (each gene is Z-scored across cells; the signature score is their mean). The resulting score is rendered on UMAP and also registered as a virtual gene so it can be reused elsewhere in the dashboard.

AI (A1) insights — optional

If enabled at build time (“Generate biology insights”), we produce cluster-wise narrative summaries with a large language model (Gemini family with automatic fallbacks). Only cluster-level summaries are provided to the model: cluster sizes; neighbor-aware top markers; method-specific marker tables; the per-cluster PanglaoDB enrichment ranks; and the top TFs. Raw counts or per-cell expression matrices are not sent. The model returns concise Markdown with a proposed label (± a refined subtype), confidence (High/Medium/Low), and supporting markers. Treat AI text as suggestive and validate against the Markers, Predicted annotation, and TF tabs.

Rendering notes: expression vectors are quantized for sparsity-aware transport and colored with a grey→rainbow palette; QC and composition views reflect the currently selected layer.

STREAM

Streamlined Toolkit for Real-time Exploratory Analysis of Multiomics

Interactive data mining and analysis of bulk and single-cell expression data — delivered as a fast, shareable HTML dashboard with UMAPs, marker discovery, enrichment, TF programs, and co-expression exploration.

About the toolkit
  • QC gating, batch view, per-cluster summaries
  • Markers Wilcoxon / t-test / logreg + “top markers”
  • Enrichment PanglaoDB-based cell-type scoring
  • TF CollecTRI ULM programs per cluster
  • Co-expression bivariate & ratio modes with contours
  • Insights optional AI-assisted, cluster-wise summaries
About this dataset
Dataset
PBMC
Layer
log1pPF_normalization
Cells
Generated
2025-08-19T13:16:19
Version
2.0
Notes: values shown reflect the current HTML build; analyses respect the selected layer.
Contributors
Support & Acknowledgements

Single Cell MultiOmics Lab

If you use STREAM in your work, please cite: STREAM (v2.0) — Streamlined Toolkit for Real-time Exploratory Analysis of Multiomics, generated 2025-08-19T13:16:19.

Legal & Licensing

Disclaimer. STREAM is a research and education tool. It is not intended for clinical, diagnostic, or patient-management decisions.

  • AI Insights (if enabled) are LLM-generated summaries that can be incomplete or incorrect; always validate with markers, enrichment and TF programs.
  • Privacy. This dashboard does not upload sample-level data to a server; rendering happens in your browser. (If you need strict offline use, bundle local JS/CSS assets instead of CDNs.)
  • Upstream AI usage. If you tick “Generate biology insights” during build, cluster-level summaries (markers, enrichment ranks, etc.) are sent to the configured AI provider to produce text; raw counts are not sent.
  • No warranty. Provided “AS IS” without warranty of any kind; you are responsible for verifying results and for compliance with any data-use terms.
STREAM code license: MIT License
Dataset licence: CC BY 4.0 (Example – replace per dataset)
Show full STREAM license
Make your omics flow
See Legal & Licensing.